Goto

Collaborating Authors

 online adaptation





Pre-Trained Multi-Goal Transformers with Prompt Optimization for Efficient Online Adaptation

Neural Information Processing Systems

Efficiently solving unseen tasks remains a challenge in reinforcement learning (RL), especially for long-horizon tasks composed of multiple subtasks. Pre-training policies from task-agnostic datasets has emerged as a promising approach, yet existing methods still necessitate substantial interactions via RL to learn new tasks.We introduce MGPO, a method that leverages the power of Transformer-based policies to model sequences of goals, enabling efficient online adaptation through prompt optimization.In its pre-training phase, MGPO utilizes hindsight multi-goal relabeling and behavior cloning. This combination equips the policy to model diverse long-horizon behaviors that align with varying goal sequences.During online adaptation, the goal sequence, conceptualized as a prompt, is optimized to improve task performance. We adopt a multi-armed bandit framework for this process, enhancing prompt selection based on the returns from online trajectories.Our experiments across various environments demonstrate that MGPO holds substantial advantages in sample efficiency, online adaptation performance, robustness, and interpretability compared with existing methods.


Make One-Shot Video Object Segmentation Efficient Again

Neural Information Processing Systems

Video object segmentation (VOS) describes the task of segmenting a set of objects in each frame of a video. In the semi-supervised setting, the first mask of each object is provided at test time. Following the one-shot principle, fine-tuning VOS methods train a segmentation model separately on each given object mask. However, recently the VOS community has deemed such a test time optimization and its impact on the test runtime as unfeasible. To mitigate the inefficiencies of previous fine-tuning approaches, we present efficient One-Shot Video Object Segmentation (e-OSVOS). In contrast to most VOS approaches, e-OSVOS decouples the object detection task and predicts only local segmentation masks by applying a modified version of Mask R-CNN. The one-shot test runtime and performance are optimized without a laborious and handcrafted hyperparameter search. To this end, we meta learn the model initialization and learning rates for the test time optimization. To achieve an optimal learning behavior, we predict individual learning rates at a neuron level.


Online Adaptation to Label Distribution Shift

Neural Information Processing Systems

Machine learning models often encounter distribution shifts when deployed in the real world. In this paper, we focus on adaptation to label distribution shift in the online setting, where the test-time label distribution is continually changing and the model must dynamically adapt to it without observing the true label. This setting is common in many real world scenarios such as medical diagnosis, where disease prevalences can vary substantially at different times of the year. Leveraging a novel analysis, we show that the lack of true label does not hinder estimation of the expected test loss, which enables the reduction of online label shift adaptation to conventional online learning. Informed by this observation, we propose adaptation algorithms inspired by classical online learning techniques such as Follow The Leader (FTL) and Online Gradient Descent (OGD) and derive their regret bounds. We empirically verify our findings under both simulated and real world label distribution shifts and show that OGD is particularly effective and robust to a variety of challenging label shift scenarios.


OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding

Ramakrishnan, Ramchalam Kinattinkara, Yuan, Zhaocong, Zhuo, Shaojie, Feng, Chen, Lin, Yicheng, Su, Chenzheng, Zhang, Xiaopeng

arXiv.org Artificial Intelligence

Speculative decoding generally dictates having a small, efficient draft model that is either pretrained or distilled offline to a particular target model series, for instance, Llama or Qwen models. However, within online deployment settings, there are two major challenges: 1) usage of a target model that is incompatible with the draft model; 2) expectation of latency improvements over usage and time. In this work, we propose OmniDraft, a unified framework that enables a single draft model to operate with any target model and adapt dynamically to user data. We introduce an online n-gram cache with hybrid distillation fine-tuning to address the cross-vocabulary mismatch across draft and target models; and further improve decoding speed by leveraging adaptive drafting techniques. OmniDraft is particularly suitable for on-device LLM applications where model cost, efficiency and user customization are the major points of contention. This further highlights the need to tackle the above challenges and motivates the \textit{``one drafter for all''} paradigm. We showcase the proficiency of the OmniDraft framework by performing online learning on math reasoning, coding and text generation tasks. Notably, OmniDraft enables a single Llama-68M model to pair with various target models including Vicuna-7B, Qwen2-7B and Llama3-8B models for speculative decoding; and additionally provides up to 1.5-2x speedup.